1 Introduction

The purpose of this kernel is to try to begin finding some methods for plotting census and police department data together on the same map. This will begin by using dot density maps for displaying the relevant information because these can be particularly helpful for displaying census data and because the transformations needed to make these work also make other visualizations easier to display in many cases. For this, just the datasets related to Boston were used to begin. Some of the packages that wll be used include tidycensus for downloading our census data in a way that is easier to map quickly, googleway for geocoding locations based on the address, and tmap for mapping.

1.2 Loading data

Here we will load in the data we are going to use from the provided files.

## OGR data source with driver: ESRI Shapefile 
## Source: "E:\Kaggle for good\data\Dept_11-00091\11-00091_Shapefiles", layer: "boston_police_districts_f55"
## with 12 features
## It has 5 fields

So far we have only used the police department data as well as the shp file containing the districts. Instead tidycensus will be used to bring in our census.

2 Census Stuff

The provided ACS tables were not used for a couple of reasons. First, they are subject tables as indicated by the prefix ‘S’. Because these tables contain data aggregated by the US census they can sometimes be more difficult to use in mapping and other applications and they are less detailed. It is often more useful to begin by pulling in data from base tables as indicated by the ‘B’ prefix since these contain the most detailed information for the ACS. Secondly, the annotations were merged into the data file itself. Having the annotations can sometimes make the table harder to read into R, for instance a revision is sometimes annotated in parentheses within a numeric column. Lastly, in the case of Boston, it appears the provided census information was for Worcester county instead of Suffolk where Boston is located.

Instead information will pulled using tidycensus. This provides us a number of advantages. It will bring in all the relevant geographies, making it unnecessary to download the shape files for the entire state and filter down to the county level. It also allows for census data to be downloaded using the names of the counties and abbreviations for the states instead of looking up the FIPS codes.

Bringing in tidycensus:

Next we still need to do some cleaning up to make mapping easier. We will begin by dropping the ‘margin of error’ columns since it won’t be used for mapping and combining the large number of categories into a smaller, easier to use number.

Now, we can begin by plotting our base map using tmap:

We also have the relevant police shp file that we can plot over the census tracts

3 Creating dot density maps

Now we’re going to create dot density maps of our education data using the categories we just created. An example of these types of maps for census data can be found at the NYTimes Mapping Segregation page, https://www.nytimes.com/interactive/2015/07/08/us/census-race-map.html.

A helfpul tutorial for creating these maps that a lot of this is pulled from can be found here: https://www.cultureofinsight.com/blog/2018/05/02/2018-04-08-multivariate-dot-density-maps-in-r-with-sf-ggplot2/

Below is the code begin creating the dots we will use for our map. This includes a custom function found at the above link to apply a random rounding algorithm on the floats to avoid any systematic bias in overall dot counts. Because creating a dot for every single person represented in the ACS will be time intensive, for this we will start by having each dot represent 25 people.

4 Getting location data for police incidents in Boston.

Since Boston doesn’t provide lat and long for their incidents, we will use their LOCATION and CITY data for geocoding to try and get plottable locations. This is far from perfect given how the address information has been recorded and you can see that some of the locations to be plotted likely aren’t accurate but this is to get started and that information we will be looked at in more detail in the next version of this kernel.

We are also going to begin by just using the ‘HOMICIDE’ incidents from our data since this more manageable for starting out.

Note: To run this on your own without error, you will likely need a google maps API key.

To run the following code, a google maps API key is needed

Finally let’s put it all together. ‘View’ mode is going to be used for tmap to provide some interactiion and to select which layers the user would like to see. Additionally, I’m still trying to figure out better ways to have the dots and bubbles scale.

5 Putting it all together

6 Finishing Thoughts and Next Steps

There are still many issues to sort out within the data and some issues to figure out in order to make this more presentable. Next steps will be to begin plotting this information for the other cities in the dataset and see if the issues presented by Boston are similar or different in those cases.